Energy Minimization in UAV-Aided Networks: Actor-Critic Learning for Constrained Scheduling Optimization
نویسندگان
چکیده
In unmanned aerial vehicle (UAV) applications, the UAV's limited energy supply and storage have triggered development of intelligent energy-conserving scheduling solutions. this paper, we investigate minimization for UAV-aided communication networks by jointly optimizing data-transmission UAV hovering time. The formulated problem is combinatorial non-convex with bilinear constraints. To tackle problem, firstly, provide an optimal algorithm (OPT) a golden section search heuristic (GSS-HEU). Both solutions are served as offline performance benchmarks which might not be suitable online operations. Towards end, from deep reinforcement learning (DRL) perspective, propose actor-critic-based stochastic (AC-DSOS) develop set approaches to confine action space. Compared conventional RL/DRL, novelty AC-DSOS lies in handling two major issues, i.e., exponentially-increased space infeasible actions. Numerical results show that able feasible solutions, save around 25-30% compared AC-DRL algorithms. developed GSS-HEU, consumes 10% higher but reduces computational time second-level millisecond-level.
منابع مشابه
Actor-Critic Reinforcement Learning with Neural Networks in Continuous Games
Reinforcement learning agents with artificial neural networks have previously been shown to acquire human level dexterity in discrete video game environments where only the current state of the game and a reward are given at each time step. A harder problem than discrete environments is posed by continuous environments where the states, observations, and actions are continuous, which is what th...
متن کاملActor-Critic Reinforcement Learning with Energy-Based Policies
We consider reinforcement learning in Markov decision processes with high dimensional state and action spaces. We parametrize policies using energy-based models (particularly restricted Boltzmann machines), and train them using policy gradient learning. Our approach builds upon Sallans and Hinton (2004), who parameterized value functions using energy-based models, trained using a non-linear var...
متن کاملActor-Critic Policy Learning in Cooperative Planning
In this paper, we introduce a method for learning and adapting cooperative control strategies in real-time stochastic domains. Our framework is an instance of the intelligent cooperative control architecture (iCCA). The agent starts by following the “safe” plan calculated by the planning module and incrementally adapting its policy to maximize the cumulative rewards. Actor-critic and consensusb...
متن کاملAn Actor-critic Algorithm for Learning Rate Learning
Stochastic gradient descent (SGD), which updates the model parameters by adding a local gradient times a learning rate at each step, is widely used in model training of machine learning algorithms such as neural networks. It is observed that the models trained by SGD are sensitive to learning rates and good learning rates are problem specific. To avoid manually searching of learning rates, whic...
متن کاملDynamic Control with Actor-Critic Reinforcement Learning
4 Actor-Critic Marble Control 4 4.1 R-code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.2 The critic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.3 Unstable actors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.4 Trading off stability against...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Vehicular Technology
سال: 2021
ISSN: ['0018-9545', '1939-9359']
DOI: https://doi.org/10.1109/tvt.2021.3075860